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Abstract. We investigate the problem of inconsistency measure¬ 
ment on large knowledge bases by considering stream-based incon¬ 
sistency measurement, i. e., we investigate inconsistency measures 
that cannot consider a knowledge base as a whole but process it 
within a stream. For that, we present, first, a novel inconsistency 
measure that is apt to be applied to the streaming case and, second, 
stream-based approximations for the new and some existing incon¬ 
sistency measures. We conduct an extensive empirical analysis on the 
behavior of these inconsistency measures on large knowledge bases, 
in terms of runtime, accuracy, and scalability. We conclude that for 
two of these measures, the approximation of the new inconsistency 
measure and an approximation of the contension inconsistency mea¬ 
sure, large-scale inconsistency measurement is feasible. 

1 Introduction 

Inconsistency measurement 53 is a subfield of Knowledge Represen¬ 
tation and Reasoning (KR) that is concerned with the quantitative as¬ 
sessment of the severity of inconsistencies in knowledge bases. Con¬ 
sider the following two knowledge bases ICi and K,2 formalized in 
propositional logic: 

1 C 1 = {a, b V c, ->a A -16, d} 1 C 2 = {a, -ia, b, -nfc} 

Both knowledge bases are classically inconsistent as for K-i we 
have {a, ->a A -16} |=_L and for IC2 we have, e. g., {a, -ia} |=_L. 
These inconsistencies render the knowledge bases useless for rea¬ 
soning if one wants to use classical reasoning techniques. In order 
to make the knowledge bases useful again, one can either use non- 
monotonic/paraconsistent reasoning techniques I II I IT2l or one re¬ 
vises the knowledge bases appropriately to make them consistent 0. 
Looking again at the knowledge bases 1 C 1 and 1 C 2 one can observe 
that the severity of their inconsistency is different. In K, 1, only two 
out of four formulas (a and -1 a A ->b) are participating in making K. 1 
inconsistent while for IC2 all formulas contribute to its inconsistency. 
Furthermore, for K-i only two propositions (a and b) participate in 
a conflict and using, e. g., paraconsistent reasoning one could still 
infer meaningful statements about c and d. For 1 C2 no such state¬ 
ment can be made. This leads to the assessment that K.2 should be 
regarded more inconsistent than Kfi . Inconsistency measures can be 
used to quantitatively assess the inconsistency of knowledge bases 
and to provide a guide for how to repair them, cf. m . Moreover, they 
can be used as an analytical tool to assess the quality of knowledge 
representation. For example, one simple inconsistency measure is to 
take the number of minimal inconsistent subsets (Mis) as an indicator 
for the inconsistency: the more Mis a knowledge base contains, the 
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more inconsistent it is. For /Ci we have then 1 as its inconsistency 
value and for 1 C2 we have 2. 

In this paper, we consider the computational problems of incon¬ 
sistency measurement, particularly with respect to scalable incon¬ 
sistency measurement on large knowledge bases, as they appear in, 
e. g., Semantic Web applications. To this end we present a novel in¬ 
consistency measure Ihs that approximates the ^-inconsistency mea¬ 
sure from {§] and is particularly apt to be applied to large knowledge 
bases. This measure is based on the notion of a hitting set which (in 
our context) is a minimal set of classical interpretations such that ev¬ 
ery formula of a knowledge base is satisfied by at least one element 
of the set. In order to investigate the problem of measuring inconsis¬ 
tency in large knowledge bases we also present a stream-based pro¬ 
cessing framework for inconsistency measurement. More precisely, 
the contributions of this paper are as follows: 

1 . We present a novel inconsistency measure Ih s based on hitting 
sets and show how this measure relates to other measures and, in 
particular, that it is a simplification of the ^-inconsistency measure 
f8| (Section[ 3 }. 

2 . We formalize a theory of inconsistency measurement in streams 
and provide approximations of several inconsistency measures for 
the streaming case (Section [ 4 ji. 

3 . We conduct an extensive empirical study on the behavior of those 
inconsistency measures in terms of runtime, accuracy, and scala¬ 
bility. In particular, we show that the stream variants of Th„ and of 
the contension measure X c are effective and accurate for measur¬ 
ing inconsistency in the streaming setting and, therefore, in large 
knowledge bases (Section[ 5 ]l. 

We give necessary preliminaries for propositional logic and incon¬ 
sistency measurement in Section [2] and conclude the paper with a 
discussion in Section [6] Proofs of technical results can be found in 
Appendix lAl 

2 Preliminaries 

Let At be a propositional signature, i. e., a (finite) set of proposi¬ 
tions, and let £(At) be the corresponding propositional language, 
constructed using the usual connectives A (and), V (or), and -1 (nega¬ 
tion). We use the symbol _L to denote contradiction. Then a knowl¬ 
edge base K. is a finite set of formulas K. C £(At). Let IK(At) be 
the set of all knowledge bases. We write K instead of K(At) when 
there is no ambiguity regarding the signature. Semantics to £(At) is 
given by interpretations u> : At —> {true, false}. Let Int(At) denote 
the set of all interpretations for At. An interpretation cj satisfies (or 
is a model of) an atom a E At, denoted by tu |= a (or uj E Mod(a)), 
if and only if to(a) = true. Both |= and Mod(-) are extended to 
arbitrary formulas, sets, and knowledge bases as usual. 

Inconsistency measures are functions X : K —t [ 0 , 00) that aim 
at assessing the severity of the inconsistency in a knowledge base 1 C, 



cf. 0. The basic idea is that the larger the inconsistency in /C the 
larger the value I (1C). However, inconsistency is a concept that is 
not easily quantified and there have been a couple of proposals for 
inconsistency measures so far, see e. g. EUBunEiBim There are 
two main paradigms for assessing inconsistency 0, the first being 
based on the (number of) formulas needed to produce inconsisten¬ 
cies and the second being based on the proportion of the language 
that is affected by the inconsistency. Below we recall some popular 
measures from both categories but we first introduce some necessary 
notations. Let 1C £ K be some knowledge base. 

Definition 1 . A set M C K. is called minimal inconsistent subset 
(Ml) of 1C if M |=_L and there is no M' C M with M' |=_L. Let 
Ml(/C) be the set of all Mis of 1C. 

Definition 2 . A formula a € 1C is called free formula of 1C if there 
is no M £ Ml(/C) with a £ M. Let Free(Ki) denote the set of all 
free formulas of 1C. 

We adopt the following definition of a (basic) inconsistency mea¬ 
sure from 0. 


The measure Imi takes the number of minimal inconsistent subsets 
of a knowledge base as an indicator for the amount of inconsistency: 
the more minimal inconsistent subsets the more severe the incon¬ 
sistency. The measure Imi<= refines this idea by also taking the size 
of the minimal inconsistent subsets into account. Here the idea is 
that larger minimal inconsistent subsets indicate are less severe than 
smaller minimal inconsistent subsets (the less formulas are needed to 
produce an inconsistency the more “obvious” the inconsistency). The 
measure X c considers the set of three-valued models of a knowledge 
base (which is always non-empty) and uses the minimal number of 
propositions with conflicting truth value as an indicator for incon¬ 
sistency. Finally, the measure X v (which always assigns an inconsis¬ 
tency value between 0 and 1) looks for the maximal probability one 
can assign to every formula of a knowledge base. All these measures 
are basic inconsistency measures as defined in Definition [ 3 ] 

Example 1 . For the knowledge bases 1C i = {a, 6 V c, ->a A 
-ib, d} and 1C2 = {a, -1 a, b, -^6} from the introduction we obtain 
Xm(lCi) = 1, Xmi'(£i) = 0.5, X c (/Ci) = 2 , X^/Cr) = 0.5, 
Xm (/Ca) = 2 ,Xmi=(/C 2 ) = l,Xc(/C 2 ) = 2,I 7J (/C 2 ) = 0.5. 


Definition 3 . A basic inconsistency measure is a function X : K — > 
[ 0 , 00 ) that satisfies the following three conditions: 

1. X(1C) = 0 if and only if 1C is consistent, 

2 . if K C Kf then I(/C) < I(/C'), and 

3. for all a £ Free(/C) we have X(1C) = X(1C \ {a}). 

The first property (also called consistency ) of a basic inconsis¬ 
tency measure ensures that all consistent knowledge bases receive a 
minimal inconsistency value and every inconsistent knowledge base 
receive a positive inconsistency value. The second property (also 
called monotony) states that the value of inconsistency can only in¬ 
crease when adding new information. The third property (also called 
free formula independence ) states that removing harmless formulas 
from a knowledge base—i. e., formulas that do not contribute to the 
inconsistency—does not change the value of inconsistency. For the 
remainder of this paper we consider the following selection of in¬ 
consistency measures: the Ml measure Imi, the Ml c measure Xmi c , 
the contension measure I c , and the 77 measure X v , which will be 
defined below, cf. 00 . In order to define the contension mea¬ 
sure X c we need to consider three-valued interpretations for proposi¬ 
tional logic El. A three-valued interpretation v on At is a function 
v : At —> {T, F,B} where the values T and F correspond to the 
classical true and false, respectively. The additional truth value B 
stands for both and is meant to represent a conflicting truth value 
for a proposition. The function v is extended to arbitrary formulas 
as shown in Table 0 Then, an interpretation v satisfies a formula a, 
denoted by v |= 3 a if either v(a) = T or v(a ) = B. 

For defining the jy-inconsistency measure 0 we need to consider 
probability functions P of the form P : Int(At) —y [0,1] with 
Sueint(At) = 1- Let 'P(At) be the set of all those probabil¬ 

ity functions and for a given probability function P £ "P(At) define 
the probability of an arbitrary formula a via P(a) = ^( 07 ). 

Definition 4 . Let Imi , Imi c , I c . and X v be defined via 


Imi (1C) = |MI(/C)|, 


Imi<= ( 1 C ) 


I C (K) 
X V {1C) 


= Y — 

^ IMI ’ 

MgMI(/C) 1 1 

= min{|v _1 (B)| | v |= 3 K.}, 


= 1 - max{£ | 3 P £ P(At) : Va £ K. : P(a) > £} 


For a more detailed introduction to inconsistency measures see 
e.g. 000 and for some recent developments see e. g. EE). 

As for computational complexity, the problem of computing an 
inconsistency value wrt. any of the above inconsistency measures 
is at least FNP- haix[] as it contains a satisfiability problem as a sub 
problem. 

3 An Inconsistency Measure based on Hitting Sets 

The basic idea of our novel inconsistency measure I hs is inspired by 
the measure X v which seeks a probability function that maximizes 
the probability of all formulas of a knowledge base. Basically, the 
measure 1^ looks for a minimal number of models of parts of the 
knowledge base and maximizes their probability in order to maxi¬ 
mize the probability of the formulas. By just considering this basic 
idea we arrive at the notion of a hitting set for inconsistent knowledge 
bases. 

Definition 5 . A subset H C Int(At) is called a hitting set of 1C if 
for every a £ 1C there is a > £ H with uj \= a. H is called a card- 
minimal hitting set if it is minimal wrt. cardinality. Let hi e be the 
cardinality of any card-minimal hitting set (define hie = 00 if there 
does not exist a hitting set of 1C). 

Definition 6. The function I hs : K — > [ 0 ,00] is defined via 
I hs(lC) = hx. — 1 for every K. £ K. 

Note, that if a knowledge base K. contains a contradictory formula 
(e.g. a A ->a) we have I hsQC) = 00. In the following, we assume 
that K, contains no such contradictory formulas. 

Example 2 . Consider the knowledge base 1C 3 defined via 

IC3 = {a V d, a A b A c, b, ->6 V -1 a, a A 6 A ->c, a A A c} 

Then {lui, a> 2 ,013} C Int(At) withtui(a) =tui(fe) = 101(c) = true, 
Ld2(a) = u; 2 (c) = true, uti(fe) = false, and 073(a) = 0)3(6) = true, 
073(c) = false is a card-minimal hitting set for 1C 3 and therefore 
Xhs(lCf) = 2 . Note that for the knowledge bases 1C\ and IC2 from 
Example 0 we have XhsiJCi) =I/i S (A 3 2 ) = 1 . 

3 FNP is the generalization of the class NP to functional problems. 



Table 1 Truth tables for propositional three-valued logic ED- 
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Proposition 1. The function Ths is a (basic) inconsistency measure. 

The result below shows that Ths also behaves well with some more 
properties mentioned in the literature BE) . For that, we denote with 
At(T’) for a formula or a set of formulas F the set of propositions 
appearing in F. Furthermore, two knowledge bases Ki, 1C 2 are semi- 
extensionally equivalent (ICi = CT IC 2 ) if there is abijection a : 1C 1 —> 
YC 2 such that for all a € /Ci we have a = cr(a). 

Proposition 2. The measure Ths satisfies the following properties: 

• If a £ K. is such that At(a) IT At (K. \ {a}) = 0 then Th s (IC) = 
ThsfifC \ {a}) (safe formula independence! 

• If 1C = CT K.' then Ths(IC) = Th s (lC') (irrelevance of syntax). 

• If a \= 0 and a ^=_L then Th s (IC U {«}) > T hs (K, U {0}) 
(dominance). 

The measure Ths can also be nicely characterized by a consistent 
partitioning of a knowledge base. 

Definition 7. A set <f> = {$ 1 ,..., <£>„} with $1 U ... U = K, 
and T = 0 for i,j = 1,... ,n,i j, is called a partitioning 
of 1C. A partitioning <f> = {(hi,. .., $ n } is consistent if Tq ^_L for 
i = 1,..., n. A consistent partitioning <f> is called card-minimal if 
it is minimal wrt. cardinality among all consistent partitionings of 1C. 

Proposition 3. A consistent partitioning <f> is a card-minimal parti¬ 
tioning of 1C if and only ifTh s (lC) = |4>| — 1. 

As Ths is inspired by T^ we go on by comparing these two mea¬ 
sures. 


Proposition 4. Let 1C be a knowledge base. If 00 > Ths(lC) > 0 
then 


Ths(lC) 


<T„(1C) < 1 


1 

Ths(lC) + 1 


Note that for Ths (1C) = 0 we always have T V (1C) = 0 as well, as 
both are basic inconsistency measures. 


Corollary 1. IfT^Kh) < T v (tC 2 ) then T hs (lCi) < T ha (K. 2 ). 


Flowever, the measures T v and Ths are not equivalent as the fol¬ 
lowing example shows. 


Example 3. Consider the knowledge bases 1C 1 = {a, ->a} and 
1C 2 = {a,b, V -ife}. Then we have ThsifCi) = Ths(/C 2 ) = 1 
butX^Ki) = 0.5 > 1/3 = X 7 ,(/C 2 ). 


It follows that the order among knowledge bases induced by T v 
is a refinement of the order induced by Ths ■ Flowever, Ths is better 
suited for approximation in large knowledge bases than T v , cf. the 
following section. 

The idea underlying Ths is also similar to the contension inconsis¬ 
tency measure T c . However, these measures are not equivalent as the 
following example shows. 


Example 4. Consider the knowledge bases /Ci and IC 2 given as 


ICi = {a A b A c, -ia A ~<b A —ic} IC 2 = {a A b, ->a A b, a A —>b} 

Then we have Th s (lCi) = 2 < 3 = Th s (lC 2 ) but T c (lC 1 ) = 3 > 
2 = T c (1C 2 ). 


4 Inconsistency Measurement in Streams 

In the following, we discuss the problem of inconsistency measure¬ 
ment in large knowledge bases. We address this issue by using a 
stream-based approach of accessing the formulas of a large knowl¬ 
edge base. Formulas of a knowledge base then need to be processed 
one by one by a stream-based inconsistency measure. The goal of 
this formalization is to obtain stream-based inconsistency measures 
that approximate given inconsistency measures when the latter would 
have been applied to the knowledge base as a whole. We first for¬ 
malize this setting and, afterwards, provide concrete approaches for 
some inconsistency measures. 

4.1 Problem Formalization 

We use a very simple formalization of a stream that is sufficient for 
our needs. 

Definition 8 . A propositional stream S is a function S : N —► 
£(At). Let S be the set of all propositional streams. 

A propositional stream models a sequence of propositional for¬ 
mulas. On a wider scope, a propositional stream can also be in¬ 
terpreted as a very general abstraction of the output of a linked 
open data crawler (such as LDSpider 03) that crawls knowledge for¬ 
malized as RDF ( Resource Description Framework) from the web, 
enriched, e. g. with OWL semantics. We model large knowledge 
bases by propositional streams that indefinitely repeat the formu¬ 
las of the knowledge base. For that, we assume for a knowledge 
base 1C = ■ ■ ■, fin} the existence of a canonical enumeration 

Kf = (fii,..., fin) of the elements of IC. This enumeration can be 
arbitrary and has no specific meaning other than to enumerate the 
elements in an unambiguous way. 

Definition 9. Let 1C be a knowledge base and Kf = (fii,..., fi„) 
its canonical enumeration. The K(-stream Sk is defined as Sic(i) = 

fi(i mod n) + l all X £ N. 

Given a )C-stream Sic and an inconsistency measure T we aim at 
defining a method that processes the elements of Sic one by one and 
approximates X(/C). 

Definition 10. A stream-based inconsistency measure J is a func¬ 
tion J : § X N -A [0, 00 ). 

Definition 11. Let T be an inconsistency measure and J a stream- 
based inconsistency measure. Then J approximates (or is an approx¬ 
imation of) T if for all 1C € K we have lim^oo Cf(S/c,i) = T(1C). 

4.2 A Naive Window-based Approach 

The simplest form of implementing a stream-based variant of any al¬ 
gorithm or function is to use a window-based approach, i. e., to con¬ 
sider at any time point a specific excerpt from the stream and apply 
the original algorithm or function on this excerpt. For any propo¬ 
sitional stream S let S X ’ J (for i < j) be the knowledge base ob¬ 
tained by taking the formulas from S between positions i and j, i. e., 

s^ = {S(i),...,s(j)}. 





















Definition 12. Let X be an inconsistency measure, w £ N U {oo}, 
and g some function g : [0, oo) x [0, oo) —► [0, oo) with g(x,y) £ 
[min{*, y}, max}*, y}]. We define the naive window-based mea¬ 
sure J^' 9 : § x N —¥ [0, oo) via 

0 if * = 0 

2(X(S max{0 ’ i - u, }’ i ), - 1)) otherwise 

for every S and i £ N. 

The function g in the above definition is supposed to be an aggre¬ 
gation function that combines the new obtained inconsistency value 
I( ( S““ {0 ’ I -"' } ' 1 ) with the previous value i— 1). This func¬ 

tion can be ,e. g., the maximum function max or a smoothing func¬ 
tion g a (x,y) = ax + (1 — a)y for some a £ [0,1] (for every 
x,y £ [0, oo)). 

Proposition 5. Let X be an inconsistency measure, w £ N U {oo}, 
and g some function g : [0, oo) x [0, oo) —► [0, oo) with g(x,y) £ 
[min{*, y}, max{i, y}]. 

1. Ifw is finite then J^' 9 is not an approximation ofX. 

2. Ifw = oo and g(x,y) > min}*, y} if x y then Jff' 9 is an 
approximation ofX. 

3. J2f' 9 (Sic, i ) < XfKffor every K, £ K and i £ N. 

4.3 Approximation Algorithms for Xy s and Z c 

The approximation algorithms for Xhs and X c that are presented in 
this subsection are using concepts of the programming paradigms of 
simulated annealing and genetic programming 0. Both algorithms 
follow the same idea and we will only formalize the one for Xhs and 
give some hints on how to adapt it for X c . 

The basic idea for the stream-based approximation of Xhs is as fol¬ 
lows. At any processing step we maintain a candidate set C £ 2 lnt ^ At ^ 
(initialized with the empty set) that approximates a hitting set of the 
underlying knowledge base. At the beginning of a processing step 
we make a random choice (with decreasing probability the more for¬ 
mulas we already encountered) whether to remove some element of 
C. This action ensures that C does not contain superfluous elements. 
Afterwards we check whether there is still an interpretation in C that 
satisfies the currently encountered formula. If this is not the case we 
add some random model of the formula to C. Finally, we update 
the previously computed inconsistency value with \C\ — 1, taking 
also some aggregation function g (as for the naive window-based 
approach) into account. In order to increase the probability of suc¬ 
cessfully finding a minimal hitting set we do not maintain a single 
candidate set C but a (multi-)set Cand = {Ci,..., C m } for some 
previously specified parameter m £ N and use the average size of 
these candidate hitting sets. 


Algorithm 1 updat e^ 9 ’^ (form) 

1: Initialize current Value and Cand 
2: N = N + 1 
3: newValue = 0 
4 : for all C £ Cand do 
5: rand £ [0,1] 

6: if rand < f(N) then 

7: Remove some random ui from C 

8: if -i3w £ C : uj \= form then 

9: Add random ui £ Mod (form) to C 

10: newValue = newValue + (|(7| — l)/|Cand| 

11: currentValue = g(newValue, currentValue) 
12: return currentValue 


(which contains a population of candidate hitting sets) is initialized 
with m empty sets. The function / can be any monotonically de¬ 
creasing function with lim^-nx, f(n) = 0 (this ensures that at any 
candidate C reaches some stable result). The parameter m increases 
the probability that at least one of the candidate hitting sets attains 
the global optimum of a card-minimal hitting set. 

As JZ' aJ is a random process we cannot show that i s an 

approximation of Xh s in the general case. However, we can give the 
following result. 

Proposition 6. For every probability p £ [0,1), g some fiinction g : 
[0,oo) x [0, oo) —> [0, oo) with g(x,y) £ [min}*, y}, max}*, y}] 
and g(x,y) > min{*,y} if x y, a monotonically decreasing 
function f : N — > [0,1] with lim n _K3o /(n) = 0, and K, £ K there 
is m £ N such that with probability greater or equal p it is the case 
that 

lim JZ’ 9J {Sk,i) =X h3 {K) 

i—y oo 

This result states that JZ’ 9 ’^ indeed approximates Xhs if we 
choose the number of populations large enough. In the next section 
we will provide some empirical evidence that even for small values 
of m results are satisfactory. 

Both Definition 1131 and Algorithm |T| can be modified slightly in 
order to approximate X c instead of Xhs , yielding a new measure 
p Qr t p e set candidates Cand contains three-valued 
interpretations instead of sets of classical interpretations. In line 7, 
we do not remove an interpretation from C but flip some arbitrary 
proposition from B to T or F. Similarly, in line 9 we do not add 
an interpretation but flip some propositions to B in order to satisfy 
the new formula. Finally, the inconsistency value is determined by 
taking the number of B-valued propositions. For more details see 
the implementations of both JZf 9 '^ an d JZ' 9 ’* - which will also be 
discussed in the next section. 


Definition 13. Let m £ N, g some function g : [0, oo) x [0, oo) —► 
[0, oo) with g(x,y) £ [min{*, y}, max}*, y}], and / : N —> [0,1] 
some monotonically decreasing function with limn^oo /(n) = 0. 
We define JZf 9,i via 


rrm,g,f 

'-'hs 


(S,i) 


0 if i = 0 

update^’ 9 ’^(iS(i)) otherwise 


for every <S and i £ N. The function update)^’ 9 ’^ is depicted in 
Algorithm [I] 


At the first call of the algorithm update)^’ 9 ’^ the value of 
currentValue (which contains the currently estimated inconsis¬ 
tency value) is initialized to 0 and the (mulit-)set Cand C 2 lnt ( At ) 


5 Empirical Evaluation 

In this section we describe our empirical experiments on runtime, 
accuracy, and scalability of some stream-based inconsistency mea¬ 
sures. Our Java implementation^ have been added to the Tweety 
Libraries for Knowledge Representation d. 

http://mthimm.de/r?r=tweety-inc-commons 
X c , Xh s - http: //mthimm. de/r?r=tweety-inc-pl 
^hs 9 ^ •http://mthimm.de/r?r=tweety-stream-hs 
. http. / /mthimm.de/r?r=tweety-stream-c 
Evaluation framework: http: / /mthimm. de/r?r=tweety-stream-eval 









Table 2 Runtimes for the evaluated measures; each value is averaged over 100 random knowledge bases of 5000 formulas; the total runtime is 
after 40000 iterations 


Measure 

RT (iteration) 

RT (total) 

Measure 

RT (iteration) 

RT (total) 

✓7-500, max 
^ X MI 

✓ 7 - 1000 ,max 

^ X MI 

✓T- 2000 ,max 

198ms 

359ms 

14703ms 

133m 

240m 

9812m 

j7'10,g 0 .75>/i 

jj- 100 ,go. 75 >/l 

j^500,g 0 .75)/l 

0.16ms 

1.1ms 

5.21ms 

6.406s 

43.632s 

208.422s 

✓ 7 - 5 OO, max 

✓7-1000,max 

✓7-2000,max 
^ X MI>= 

198ms 

361ms 

14812ms 

134m 

241m 

9874m 

7'1O>0O.75 ,/l 

^hs 

✓7-100,go.75 >/l 
hs 

✓7-500,go.75 >/l 
hs 

0.07ms 

0.24ms 

1.02ms 

2.788s 

9.679s 

40.614s 


5.1 Evaluated Approaches 

For our evaluation, we considered the inconsistency measures Imi, 
Imic, X v , I c , and Xh s - We used the SAT solver lingelin^\ for the 
sub-problems of determining consistency and to compute a model of 
a formula. For enumerating the set of Mis of a knowledge base (as 
required by Imi and ImiO we used MARCC0. The measure X v was 
implemented using the linear optimization solver (pjo/vfl The mea¬ 
sures Imi , Imi° , and I, ; were used to define three different versions of 
the naive window-based measure J^' 5 6 7 8 9 (with w = 500, 1000, 2000 
and g = max). For the measures I c and Xhs we tested each three 
versions of their streaming variants J™' 9 °- 7 5 >/i and J'™’ 90 - 75 ’? 1 
(with m = 10,100, 500) with /i : N —» [0,1] defined via 
fi(i) = l/(i 4- 1) for all i £ N and f/o.75 is the smoothing func¬ 
tion for a = 0.75 as defined in the previous section. 

5.2 Experiment Setup 

For measuring the runtime of the different approaches we generated 
100 random knowledge bases in CNF ( Conjunctive Normal Form) 
with each 5000 formulas (=disjunctions) and 30 propositions. For 
each generated knowledge base K, we considered its /C-stream and 
processing of the stream was aborted after 40000 iterations. We fed 
the /C-stream to each of the evaluated stream-based inconsistency 
measures and measured the average runtime per iteration and the to¬ 
tal runtime. For each iteration, we set a time-out of 2 minutes and 
aborted processing of the stream completely if a time-out occurred. 

In order to measure accuracy, for each of the considered ap¬ 
proaches we generated another 100 random knowledge bases with 
specifically set inconsistency valued used otherwise the same set¬ 
tings as above, and measured the returned inconsistency values. 

To evaluate the scalability of our stream-based approach of Xhs we 
conducted a third experiment where we fixed the number of propo¬ 
sitions (60) and the specifically set inconsistency value (200) and 
varied the size of the knowledge bases from 5000 to 50000 (with 
steps of 5000 formulas). We measured the total runtime up to the 
point when the inconsistency value was within a tolerance of ±1 of 
the expected inconsistency value. 

The experiments were conducted on a server with two Intel Xeon 
X5550 QuadCore (2.67 GHz) processors with 8 GB RAM running 
SUSE Linux 2.6. 

5 http://fmv.jku.at/lingeling/ 

6 http://sun.iwu.edu/~mliffito/marco/ 

7 http://lpsolve.sourceforge.net 

8 The sampling algorithms can be found at 

http://mthimm.de/r?r=tweety-sampler 

9 We did the same experiment with our stream-based approach of X c but do 
not report the results due to the similarity to Xhs and space restrictions. 


5.3 Results 

Our first observation concerns the inconsistency measure 2L which 
proved to be not suitable to work on large knowledge baseo- Com¬ 
puting the value I V {IC) for some knowledge base K, includes solving 
a linear optimization problem over a number of variables which is 
(in the worst-case) exponential in the number of propositions of the 
signature. In our setting with |At| = 30 the generated optimization 
problem contained therefore 2 30 = 1073741824 variables. Hence, 
even the optimization problem itself could not be constructed within 
the timeout of 2 minutes for every step. As we are not aware of any 
more efficient implementation of X v , we will not report on further 
results for X v in the following. 

As for the runtime of the naive window-based approaches of Imi 
and I M |c and our stream-based approaches for I c and Xhs see Ta- 
ble [2] There one can see that J^' 9 and J l"’ 9 on the one hand, and 
J™' 9 '^ and J™ s ' 9 '^ on the other hand, have comparable runtimes, 
respectively. The former two have almost identical runtimes, which 
is obvious as the determination of the Mis is the main problem in 
both their computations. Clearly, J™' 9 '^ and are signifi¬ 

cantly faster per iteration (and in total) than J7^“’ 9 and J7^“’ 9 , only 
very few milliseconds for the latter and several hundreds and thou¬ 
sands of milliseconds for the former (for all variants of m and w). 
The impact of increasing w for and J is expectedly 

linear while the impact of increasing the window size w for J7^“’ 9 
and /7j“’ 9 is exponential (this is also clear as both solve an FNP- 
hard problem). 

As for the accuracy of the different approaches see Figure Q] (a)- 
(d). There one can see that both J™ s ' 9,1 and J™' 9 ’ f IFiguresITaland 
GB converge quite quickly (almost right after the knowledge base 
has been processed once) into a [—1,1] interval around the actual 
inconsistency value, where J7 c m ’ 9 ’^ is even closer to it. The naive 
window-based approaches (Figures QB and 1 1 db have a comparable 
bad performance (this is clear as those approaches cannot see all Mis 
at any iteration due to the limited window size). Surprisingly, the 
impact of larger values of m for J™ s ’ 9 ’ :l and J™ ,9, f is rather small 
in terms of accuracy which suggests that the random process of our 
algorithm is quite robust. Even for m = 10 the results are quite 
satisfactory. 

As for the scalability of vl^’ 90,75 ’^ 1 see FiuurelTcl There one can 
observe a linear increase in the runtime of all variants wrt. the size 
of the knowledge base. Furthermore, the difference between the vari¬ 
ants is also linearly in the parameter m (which is also clear as each 
population is an independent random process). It is noteworthy, that 

10 More precisely, our implementation of the measure proved to be not suit¬ 
able for this setting 
















the average runtime for J ^' ao ' 75 is about 66.1 seconds for knowl¬ 
edge bases with 50000 formulas. As the significance of the parameter 
m for the accuracy is also only marginal, the measure f7^°’ B0,75 ’^ 1 
is clearly an effective and accurate stream-based inconsistency mea¬ 
sure. 

6 Discussion and Conclusion 
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Figure 1: (a)-(d): Accuracy performance for the evaluated measures 
(dashed line is actual inconsistency value); each value is averaged 
over 100 random knowledge bases of 5000 formulas (30 proposi¬ 
tions) with varying inconsistency values; (e): Evaluation of the seal- 
ability of Jhl ’ 90 ' 75 ; each value is averaged over 10 random knowl¬ 
edge bases of the given size 


In this paper we discussed the issue of large-scale inconsistency mea¬ 
surement and proposed novel approximation algorithms that are ef¬ 
fective for the streaming case. To the best of our knowledge, the 
computational issues for measuring inconsistency, in particular with 
respect to scalability problems, have not yet been addressed in the 
literature before. One exception is the work by Ma and colleagues 
GD who present an anytime algorithm that approximates an incon¬ 
sistency measure based on a 4-valued paraconsistent logic (similar 
to the contension inconsistency measure). The algorithm provides 
lower and upper bounds for this measure and can be stopped at any 
point in time with some guaranteed quality. The main difference be¬ 
tween our framework and the algorithm of m is that the latter needs 
to process the whole knowledge base in each atomic step and is there¬ 
fore not directly applicable for the streaming scenario. The empiri¬ 
cal evaluation ED also suggests that our streaming variant of Xh s is 
much more performant as Ma et al. report an average runtime of their 
algorithm of about 240 seconds on a knowledge base with 120 for¬ 
mulas and 20 propositions (no evaluation on larger knowledge bases 
is given) while our measure has a runtime of only a few seconds for 
knowledge bases with 5000 formulas with comparable accuracJH. A 
deeper comparison of these different approaches is planned for future 
work. 

Our work showed that inconsistency measurement is not only a 
theoretical field but can actually be applied to problems of reasonable 
size. In particular, our stream-based approaches of Xhs and X c are 
accurate and effective for measuring inconsistencies in large knowl¬ 
edge bases. Current and future work is about the application of our 
work on linked open data sets (6l . 
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A Proofs of technical results 


□ 

Proposition 3. A consistent partitioning $ is a card-minimal parti¬ 
tioning of 1C if and only ifXh s (!C) = 1$! — 1. 

Proof. Let = {'Ll,..., 4>n} be a consistent partitioning and let 
Ui G Int(At) be such that u>i \= d?; (for i = 1 Then 

{tui,... ,u„} is a hitting set of 1C and we have h/c < |3>|. With the 
same idea one obtains a consistent partitioning $ from every hitting 
set H of (C and thus h/c > | < f >/ 1 for every card-minimal partitioning 
of 1C. Hence, Xhs {1C) = |$| — 1 for every card-minimal partitioning 
<f»of/C. □ 

Proposition 4. Let 1C be a knowledge base. If oo > Xhs ( 1C ) > 0 
then 

1 ~ xmc, < Iv{K:) - 1 _ x hs (ic) +1 


Proposition 1. The function Xh s is a (basic) inconsistency measure. 

Proof. We have to show that properties 1.), 2.), and 3.) of Defini- 

tion[3]are satisfied. 

1. If 1C is consistent there is a u 6 Int(At) such that uj |= a for 
every a G 1C. Therefore, H = {tu} is a card minimal hitting set 
and we have h/c = 1 and therefore Xh s (IC) = 0. Note that for 
inconsistent 1C we always have hic > 1. 

2. Let tC C JC' and let H be a card-minimal hitting set of K!. Then 
H is also a hitting set of 1C (not necessarily a card-minimal one). 
Therefore, we have hie < h/c and Xhs(lC) < Xhs(lC). 

3. Let a G Free((C) and define K! = 1C \ {a}. Let H be a card- 
minimal hitting set of K! and let u G H. Furthermore, let K." C 
K.' be the set of all formulas such that u |= /3 for all /3 G 1C". It 
follows that 1C" is consistent. As a is a free formula it follows that 
K." U {a} is also consistent (otherwise there would be a minimal 
inconsistent subset of 1C" containing a). Let ui' be a model of 
1C" U {a}. Then H' = (H \ {at}) U {u/} is a hitting set of 1C 
and due to 2.) also card-minimal. Hence, we have hjc = hie and 
X hs (lC') = X hs (lC). 

□ 

Proposition 2. The measure Xhs satisfies the following properties: 

• If a G 1C is such that At(a) fl At (K. \ {a}) = 0 then Xh s {lC) = 
Xhs(lC \ {a}) (safe formula independence). 

• If 1C = CT 1C' then Xhs(lC) = Xh s {lC ) (irrelevance of syntax). 

• If a \= /3 and a ^=_L then X hs (lC U {«}) > Xh s (lC U {,9}) 
(dominance). 


Proof. For the right inequality, let H be a card-minimal hitting set 
of 1C, i. e., we have Xhs{lC) = |iT| — 1. Define a probability func¬ 
tion P : Int(At) —> [0,1] via P(uj) = 1/|TT| for every uj G H and 
P(uj') = 0 for every u/ G Int(At) \ H (note that P is indeed a prob¬ 
ability function). As H is a hitting set of K, we have that P(4>) > 
1/| IT | for every </> G 1C as at least one model of tf> gets probability 
l/\H\ inP. So we have < 1 — l/|iV| = 1 - l/(X hs (lC) + 1). 
For the left inequality we only sketch a proof. Assume that X v (1C) < 
1/2, then we have to show that Xh s (lC) < 2 which is equivalent to 
Xhs(lC) < 1 as the co-domain of Xhs is a subset of the natural num¬ 
bers. If X V (1C) < 1/2 then there is a probability function P with 
P(<j>) > 1/2 for all (j> G 1C. Let Fp = {lo G Int(At) | P(lo) > 0} 
and observe = 1- Without loss of generality assume 

that P(oj) = P(uj') for all w, w' G Tjfl Then every (j> G K. has to 
be satisfied by at least half of the interpretations in Tp in order for 
P((j)) = u\=cj> Pi 10 ) — 1/2 to hold. Then due to combinato¬ 

rial reasons there have to be tui, ot2 G Tp such that either uj\ \= (j> or 
lo 2 \= <l> for every (f> G 1C. Therefore, {uti, 012} is a hitting set and we 
have Xhs (1C) < 1. By analogous reasoning we obtain Xhs (1C) < 2 
ifX v (lC) <2/3 (and therefore P(4>) > 1/3 for all f G 1C) and the 
general case Xhs (1C) < i if X V (1C) < (i — 1 )/i and, thus, the claim. 
Note finally that X v (1C) = 1 if and only if 1C contains a contradictory 
formula which is equivalent to Xhs (1C) = 00 and thus ruled out. □ 

Corollary 1. IfX v (lC 1) < ^(£2) then X hs (lCi) < X fes ((C 2 ). 

Proof. We show the contraposition of the claim, so assume 
Xhs(lC r) > X h s(K, 2 ) which is equivalent to((Ci) > Xh s (lC 2) + l 
as the co-domain of Xhs is a subset of the natural numbers. By Propo- 
sition[4]we have 


Proof. 

• This is satisfied as safe formula independence follows from free 
formula independence, cf. mm. 

• Let H be a card-minimal hitting set of 1C. So, for every a G 1C 
we have ui G H with uj \= a. Due to a = cr(a) we also have 
uj \= cr(a) and, thus for very (5 G 1C' we have u> G H with 
uj |= /3. So H is also a hitting set of 1C'. Minimality follows from 
the fact that o is a bijection. 

• Let IT be a card-minimal hitting set of (Ci = (C U {a} and let 
uj £ H be such that 10 \= a. Then we also have that ui \= j5 and 
H is also a hitting set of (C 2 = (C U {/3}. Hence, h/c, > h/c 2 and 
Xhs(lC l) >Xhs(lC2). 


X^/Ci) > 1 - 


Xhs(lC\ 


> 1 - 


1 


Xhs(lC 2 ) +1 


>^(£ 2 ) 


which yields X V ( 1 C 1 ) > X v (IC2) ■ 


□ 


Proposition 5. Let X be an inconsistency measure, w G N, and 
g some fiinction g : [0, 00 ) x [0, 00 ) —> [0, 00 ) with g(x, y) G 
[mm{x,y},ma.x{x,y}\. 

1. If w is finite then CJ^’ 9 is not an approximation ofX. 

12 Otherwise let k G Q n [0,1] be the least common denominator of all 
P(uj), uj G Tp, and replace in Tp every uj by k duplicates of uj with 
probability P(uj)/k each; for that note that P can always be defined using 
only rational numbers, cf. 0 







2. If w = oo and g{x,y) > min{a ',,y} if x ^ y then Jff' 9 is an 
approximation of I. 

3. Jff' 9 (5/c, i) < I(/C) for every KeK and i G N. 

Proof. 

1. Assume K, is a minimal inconsistent set with |/C| > w. Then 
j^maxto,»-«.},*) = o for all j > 0 (as every subset of K, is 
consistent) and (S, i) = 0 for all* > 0 as well. As X is an 
inconsistency measure it holds X{K) > 0 and. hence, Jff' 9 does 
not approximate X. 

2. If w = oo we have x{S ulax ^ 0 ’ l ~ w ^’ 1 ) = X(JC) for all i > io for 
some *o G N. As g(x, y) > min{x, y} the value X(1C) will be 
approximated by J™' 9 eventually. 

3. This follows from the fact that I is a basic inconsistency measure 
and therefore satisfies X(fC) < X(IC') for K. C K,'. 


□ 

Proposition 6. For every probability p € [0, 1), g some function g : 
[0, oo) x [0, oo) —> [0, oo) with g(x,y) G [min{a:, y}, max{tc, t/}] 
and g(x,y) > min {x,y} if x ^ y, a monotonically decreasing 
function f : N —¥ [0, 1] with lim,,-^ /(n) = 0, and K, G IK there 
is m £ N such that with probability greater or equal p it is the case 
that limi-Hx, {Sk, i) = X hs (/C). 

Sketch. Consider the evolution of single candidate set C\ G Cand 
during the iterated execution of updat {form), initialized 
with the empty set 0. Furthermore, let C be a card-minimal hitting 
set of K,. In every iteration the probability of selecting one u> G C 
to be added to C\ is greater zero as at least one u> G C is a model 
of the current formula. Furthermore, the probability of not removing 
any interpretation u/ G C\ is also greater zero as / is monotonically 
decreasing (ignoring the very first step). Therefore, the probability 
pi that C\ evolves to C (and is not modified thereafter) is greater 
zero. Furthermore, the evolution of each candidate set Ci G Cand is 
probabilistically independent of all other evolutions and by consid¬ 
ering more candidate sets, i. e., by setting the value m large enough, 
more candidate sets will evolve to some card-minimal hitting set 
of K, and the average cardinality of the candidate sets approximates 
X hs {1C) + 1. ^ □ 



